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Abstract 

We provide a framework to bound the probability that accumulated errors were never above a 
given threshold on numerical algorithms. Such algorithms are used for example in aircraft and nu- 
clear power plants. This report contains simple formulas based on Levy’s and Markov’s inequalities 
and it presents a formal theory of random variables with a special focus on producing concrete re- 
sults. We selected four very common applications that fit in our framework and cover the common 
practices of systems that evolve for a long time. We compute the number of bits that remain continu- 
ously significant in the first two applications with a probability of failure around one out of a billion, 
where worst case analysis considers that no significant bit remains. We are using PVS as such formal 
tools force explicit statement of all hypotheses and prevent incorrect uses of theorems. 


1 Introduction 

Formal proof assistants are used in areas where errors can cause loss of life or significant financial 
damage, as well as in areas where common misunderstandings can falsify key assumptions. For this 
reason, formal proof assistants have been much used for floating point arithmetic [1, 2, 3, 4, 5] and 
probabilistic or randomized algorithms [6, 7]. Previous references link to a few projects using proof 
assistants such as ACF2 [8], HOF [9], Coq [10] and PVS [1 1]. 

All the above projects that deal with floating point arithmetic aim at containing worst case behavior. 
Recent work has shown that worst case analysis may be meaningless for systems that evolve for a long 
time, as encountered in industry. A good example is a process that adds numbers in ±2 with a measure 
error of ±2 -24 . If this process adds 2 25 items, then the accumulated error is ±2, and note that 10 hours 
of flight time at operating frequency of 1 kHz is approximately 2 25 operations. Yet we easily agree 
that provided the individual errors are not correlated, the actual accumulated errors will continuously be 
much smaller than ±2. 

We present in Section 2 a few examples where this work can be applied. We focus on applications 
for n counting in billions and a probability of failure of about one out of a billion. Should one of these 
constraints be removed or lessened, the problems become much simpler. The main contributions of 
this work are the selection of a few theorems amenable to formal methods in a reasonable time, their 
application to software and systems reliability, and our work with PVS. Section 3 presents the formal 
background on probability with Markov’s and Fevy’s inequalities and how to use this theory to assert 
software and system reliability. 

Doob-Kolmogorov’s inequality was used in previous work [12]. It is an application of Doob’s in- 
equality, but it can be proved with elementary manipulations for second order moments. It is better than 
Fevy’s inequality in the sense that it can applied to any sum of independent and centered variables. Yet 
it is limited by the fact that it bounds only second order moments. 

2 Applications 

Fevy’s inequality works with independent symmetric random variables, as we safely assume in Sec- 
tions 2.1 and 2.2. The applications presented in Section 2.3 cannot be treated by Fevy’s inequality. Yet 
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Listing 1 : Accumulation of n values, which can also be viewed as a dot product with ci, — bj x a 


1 ao = 0 ; 

2 for 0=1; i<=n ; i = i+ 1) 

3 a, = i © di ; // is replaced by a, = a,_i + dj + Xj 

4 // to accommodate round-off errors and existing errors on dj 


we may obtain similar results with Doob’s inequality which is not restricted to symmetric variables. 
Alas, we foresee that the time needed to develop Doob’s inequality in any of the formal tools available 
today is at least a couple of years. Automatic treatment of all the following applications may use interval 
arithmetic that has been presented in previous publications and is now available in formal tools [4, 5]. 

2.1 Long accumulations and dot products 

A floating point number represents v = m x 2 e where e is the exponent, an integer, and m is the mantissa 
[13]. IEEE 754 standard [14] on floating point arithmetic uses sign-magnitude notation for the mantissa 
and the first bit bo of the mantissa is implicit in most cases (bo = 1). leading to the first definition in 

equation (1). Some circuits such as the TMS320 [15] use two’s complement notation for m, leading to 

the second definition in equation (1). The sign s and all the bj are either 0 or 1 . 

v — (—iy x bo.b\ ■ ■ ■ b p -i x2 e or v = (bo-bi -b p -\ — 2 x s) x 2 e (1) 

In fixed point notation e is a constant provided by the data type and bo cannot be forced to 1 . We define 

for any representable number v, the unit in the last place (ulp) function below, with the notations of 
equation (1). 

ulp(v) - 2 e ~i’ +1 

The example given in Listing 1 sums n values, a n ). When the accumulation is performed 

with floating point arithmetic, each iteration introduces a new round-off error Xj. One might assume that 
Xj follows a continuous or discrete uniform distribution on the range ±u with u = ulp(a,)/2, as trailing 
digits of numbers randomly chosen from a logarithmic distribution [16, pp. 254-264] are approximately 
uniformly distributed [17]. A significantly different distribution may mean that the round-off error con- 
tains more than trailing digits. 

Errors created by operators are discrete and they are not necessarily distributed uniformly [18]. As 
they are symmetric, we only have to bound the moments involved in our main result, as in equation (2). 


Mean 

Variance 

4 th order moment 

6 th order moment 

8 th order moment 

E (Xj) = 0, 

E(Xf) < y, 

E(tf)< j, 

E(*f)<y, 

E& 8 )<f 


If a, uses a directed rounding mode, we introduce Xj — Xj — E( A, j and we use equation (2) again. We 
may also assume that dj carries some earlier round-off errors. In this case, Xj is a linear combination of 
i round-off errors satisfying equation (2) for a given u. 

If dj is a data obtained by an accurate sensor, we may assume that the difference between d, and the 
actual value dj follows a normal distribution very close to a uniform distribution on the range ±u, with 
some new value of u. In this case, we model the error dj — dj by a symmetric random variable and X, is 
the sum of two random variables (t = 2) satisfying equation (2) for a given u. 
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Table 1: Number of significant bits with a probability of failure P ( max (|5,-|) > £ ) bounded by P 

\\<i<n J 


u 

n 

i 

p 

2k 

e « 

Number of significant bits 

2 -24 

10 9 

2 

10“ 9 

2 

68.825 

-6.10 





4 

0.42832 

1.22 





6 

0.085786 

3.54 





8 

0.040042 

4.64 





44 

0.010153 

6.62 

2 -24 

10 9 

10 

10-to 

2 

486.66 

-8.92 





4 

1.7031 

-0.77 





6 

0.28156 

1.82 





8 

0.11939 

3.06 





48 

0.023873 

5.38 

u 

M ~ 3 / 2 

1 

M 3 / 2 

2 

{/4u~ 2 /9 

(l°g 2 m — 1 + log 2 3)/2 





4 

\/4u~ 1 /9 

(log 2 n — 2 + 2 log 2 3)/8 





6 

‘^100/81 

(2 log 2 3 — log 2 10) /6 





8 

y/4900n/729 

(- log 2 u - 2 log 2 70 + 6 log 2 3) / 1 6 


After n iterations and assuming that all the errors introduced are independent, we want the probability 
that the accumulated errors have exceeded some user specified bound e: 

, , „ Xj is the sum of i symmetric random 

P I max (\Si\) > £ <P with S n — V X, and variables satisfying equation (2) for a (3) 

V 1 <i<n ) 

x ~ 7 1=1 given u. 

Previous work used Doob-Kolmogorov’s inequality. We will see in Section 3 that we can exhibit 
tighter bounds using Levy’s inequality followed by Markov’s one. Table 1 presents the number of sig- 
nificant bits of the results (i.e. — log 2 e) for some values of u, n, l, and P in equation (3). These values 
are obtained by using one single value of u, as large as needed. Tighter results can be obtained by using 
a specific value of u for each random variable and each iteration. 

2.2 Recursive filters operating for a long time 

Recursive filters are commonly used in digital signal processing and appear, for example, in the programs 
executed by Flight Control Primary Computers (FCPC) of aircraft. Finite impulse response (FIR) filters 
usually involve a few operations and can be treated by worst case error analysis. However, infinite 
impulse response (HR) filters may slowly drift. 

The theory of signal processing provides that it is sufficient to study second order HR with coef- 
ficients b\ and /? 2 such that the polynomial X 2 — b\X — bj has no zero in IS. Listing 2 presents the 
pseudo-code of one such filter. A real implementation would involve temporary registers. 

When implemented with fixed or floating point operations, each iteration introduces a compound 
error Xj that is a linear combination of (' individual errors satisfying equation (2) for a given u. As these 
filters are linear, we study the response to do = 1 and dj — 0 otherwise, to deduce the accumulated effect 
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Listing 2: Infinite impulse response (HR) filter operated on n iterations 


1 y-1 = 0 ; yo - do ; 

2 for 0 = I ; i<=n; i = i-\- 1) 

3 yi=diQb\yi-\Qbiyi-2 ; // is replaced by y ( = dj +Xj - b x v,-i - biyt -2 

4 // to accommodate round-off errors and existing errors on dj 


of all the errors on the output of the filter. This response is defined as the sequence of real numbers such 
that 

_y_i = 0 , >’o = 1 , and y n = -b\y n -\ -b 2 y n -2 for all n> 1 . 

This sequence can also be defined by the expression 

y n — b n J~ \Jb 2 + 2b\ cos(ft>o + nft)) with constants (Oq and co. 

If the filter is bounded-input bounded-output (BIBO) stable, 0 < b2 < 1 and the accumulated effect of the 
round-off errors is easily bounded by \J b 2 + 2b\ / (1 — s/bj). Worst case error analysis is not possible 
on BIBO unstable systems. Our work and the example of Table 1 can be applied to such systems. 

2.3 Long sums of squares and Taylor series expansion of programs 

The previous programs introduce only first order effect of the round-off errors. We present here systems 
that involve higher order errors such as sum of square in Listing 3 that introduces Dj and power series of 
all the random variables as in equation 4. 

Assuming that dj carries an error X, in Listing 3, its contribution to the sum of square cannot be 
assumed to be symmetric. Levy’s inequality cannot be applied, but Doob’s inequality provides a similar 
result, though a large number of foundational results are still lacking in existing formal libraries. 

Listing 3: Sum of n squares 


1 no = 0 ; 

2 for 0=1; i<=n ; i = i+l) 

3 a,- = a,-_ i © di <E> dj ; // is replaced by cij = aj-\ +Xj + (dj + Dj) 2 = cij-i +Xj + 2djDj + df + Dj 

4 // to accommodate round-off errors and existing errors on dj 


The output of a system can always be seen as a function F of its input and its state (di,...,d q ). 
This point of view can be extended by considering that the output of the system is also a function of the 
various round-off errors (Zj , . . . . X n ) introduced at run-time. Provided this function can be differentiated 
sufficiently, Taylor series expansion at rank r provides that 

F(d\,... ,dq,X\,. . . ,X„) = F(d \ , . . . , dq, 0 , . . . , 0 ) 




m= 1 m ! \*=l 



■ ■ d q . () . 


, 0 ) 


( 4 ) 


E* 

0=1 


dF 

<)X, 




( d\ , . . . , dq, $1 5 ... 5 j 
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where 0, is between 0 and X,-, and (-)W is the symbolic power defined as 


I* 

0=1 


dF 

dX: 


L 


d F 

X ir --Xi , . 

1 m dX h ■ ■ ■ dXi 


When the Taylor series is stopped after m — 2, we can use Doob’s inequality to provide results 
similar to the ones presented in Table 1 , provided the X,- are symmetric and independent. Higher order 
Taylor series don't necessarily create sub-martingales but weaker results can be obtained by combining 
inequalities on sub-martingales. 


3 Formal background on probability 

3.1 A generic and formal theory of probability 

We rebuilt the previously published theory of probability spaces [12] as a theory of Lebesgue’s integra- 
tion recently became fully available. The new PVS development in Figure 1, still takes three parameters: 
T, the sample space, SF, a c-algcbra of permitted events, and P, a probability measure, which assigns to 
each permitted event in SF, a probability between 0 and 1 . Properties of probability that are independent 
of the particular details of T, SF , and P are then provided in this file. 

A random variable X is a measurable application from (T, SF') to any other measurable space ( T' . SF ' ) . 
In most theoretical developments of probability, T, SF, and P remain generic, as computations are carried 
on T'. Results on real random variables use T' = M whereas results on random vectors use T' — W. Yet 
both theories refer explicitly to the Borel sets of T', the elements of the smallest a-algebra containing the 
open sets of T'. 

As the Borel sets of M and JR" are difficult to grasp, most authors consider finite T and SF — RET) 
for discrete random variables in introductory classes. This simpler analysis is meant only for educational 
purposes. Handling discrete and continuous random variables through different T and SF parameters is 
not necessary and it is contrary to most uses of probability spaces in mathematics. Both discrete and 
continuous variables can be described on the same generic T, SF, and P parameters in spite of their 
differences. Similarly, many authors work on sections {X < x} rather than using the inverse images 
of Borel sets of T' because the latter are difficult to visualize. Such a simplification is valid thanks to 
Dynkin’s systems. But using abstract Borel sets rather than sections in formal methods often leads to 
easier proofs. 

3.2 A concrete theory of expectation 

The previous theory of random variables [12] made it possible to define them and to use and derive 
their properties. Very few results were enabling users to actually compute concrete results on random 
variables. Most of such results lie on a solid theory of the expected value. As most theorems in the latter 
theory are corollaries of a good theory of Lebesgue’s integration, we have developed a formal measure 
theory based on Lebesgue’s integration and we develop formal theorems on expected values, as needed 
in our applications. 

The expected value is the (unique) linear and monotonous operator E on the set of P-integrable 
random variables that satisfies Beppo-Levy’s property and such that E ( Xa ) — P(A) for all A € SF . We 
can also use the following definition when Lebesgue’s integral exists: 

E(X) = J X dP. 

Markov’s inequality below is heavily used to obtain concrete properties on random variables. 
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probabi 1 it y_ space [T : TYPE+ , (IMPORTING topology@subset_algebra.def [T] ) 

S : sigma.algebra, (IMPORTING probability .measure [T ,S] ) 

P : probability .measure] : THEORY 

BEGIN 

IMPORTING topology@sigma.alge bra [T ,S] , 
probability .measure [T,S] , 
continuous.functions.aux [real] , 
measure _theory@measure_space [T,S] , 
measure _theory@measure .props [T,S ,to_measure (P)] 

limit: MACRO [(convergence.sequences . convergent) ->real] 

= convergence.sequences . limit 

h : VAR borel.function 

A,B: VAR (S) 

x,y: VAR real 

nOz : VAR nzreal 

t : VAR T 

n : VAR nat 

X,Y: VAR random. variable 

XS: VAR [nat ->random_ variable] 

null? (A) :bool = P(A) = 0 

non.null? (A) :bool = NOT null? (A) 

independent? (A ,B) :bool = P(intersection(A ,B) ) = P(A) * P(B) 

zero: random.variable = (LAMBDA t: 0) 
one: random.variable = (LAMBDA t: 1) 

; % Needed for syntax purposes! 

<=(X,x): (S) = {t I X(t) <= x}; %<=>=>/= omitted 
complement _eq: LEMMA complement (X = x) = (X /= x) % More omitted 
; % Needed for syntax purposes! 

+ (X,x): random.variable = (LAMBDA t: X(t) + x) ; °l More omitted 

borel.comp_rv_is.rv: JUDGEMENT o(h,X) HAS.TYPE random.variable 
part ial.sum.is.random. variable : 

LEMMA random.variable? (LAMBDA t: sigma(0 ,n, LAMBDA n: XS(n)(t))) 

distribution.function? (F : [real- probability] ) :bool 

= EXISTS X: FORALL x: F(x) = P(X <= x) 
distribution.function: TYPE+ = (distribution.function?) CONTAINING 

(LAMBDA x: IF x < 0 THEN 0 ELSE 1 END IF) 
distribution.function(X) (x) probability = P(X <= x) 

convergence. in.distribut ion? (XS ,X) :bool 

= FORALL x: continuous (distribution.function(X) ,x) IMPLIES 

convergence ( (LAMBDA n: distribution_function(XS(n) ) (x) ) , 
distribution.function(X) (x) ) 


invert.distribution: LEMMA LET F = distribution.function(X) IN 

P(x < X) = 1 - F(x) 

interval.distribution: LEMMA LET F = distribution.function(X) IN 

x <= y IMPLIES 

P(intersection(x < X, X <= y)) = F(y) - F(x) 
limit.distribution: LEMMA LET F = distribution.function(X) IN 

P(X = x) = F (x) - limit (LAMBDA n: F(x-1/ (n+1) ) ) 


F: VAR distribution.function 

distribution.O : LEMMA convergence (F o (lambda (n:nat) : -n) ,0) 

distribution.l : LEMMA convergence (F, 1) 

distribution. increasing: LEMMA increasing? [real] (F) 

distribution.right. continuous : LEMMA right. continuous (F) 

END probability.space 


Figure 1 : Abbreviated probability space file in PVS 
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Theorem 1 (Markov’s inequality). For any random variable X and any constant £ > 0, 


P(|X| >£) < 


E(|X|) 

£ 


Many theorems relate to independent random variables and their proofs are much easier once inde- 
pendence is well defined. The family (X\,...,X n ) is independent if and only if, for any family of Borel 
sets 

p(n =mx,e Bi ). 

The following characteristic property is used a lot on families of independent variables: 

For any family of Borelean functions (h\, . . .,h„) such that the hfXj) are P-integrable, 

E (nw^ =n E (^(^))- 

It is worth noting that the fact that n random variables are independent is not equivalent to the fact 
that any pair of variables is independent, and cannot be built recursively from n — 1 independent random 
variables. 

Future work may lead us to implement a theory of the law associated to each random vector 
X : T — > M", with a "transfer” theorem for any Borelean function h : W — > M below, and most properties 
of Lebesgue’s integral including Fubini's theorem. 

E(h(X)) = [ h(X) dP= f hdP x 

./T ./ R" 

3.3 Almost certain a priori error bound 

What we are actually interested in is whether a series of calculations might accumulate a sufficiently large 
error to become meaningless. In the language we have developed, we are computing the probability that 
a sequence of n calculations has failed because it has exceeded the £ error-bound somewhere. 

Theorem 2 (Corollary of Levy’s inequality). Provided the (X n ) are independent and symmetric the 
following property holds for any constant £. 

P ( max (|S;|) > £ ) < 2P(|S„| > s) 

\l<i<n J 


Proof We use a proof path similar to the one published in [19]. We define s\P below with Dirichlet’s 
operator Sp, which is equal to 1 if the predicate holds and 0 otherwise. As the X n are symmetric, the 
random variables S n and S„ share the same probability density function. 

S ( „ j) - £(-l)^X ; 

i= 1 

We now define N — i n l { /c such that Si > £} with the addition that inf0 — +°o and similarly N 1 ' 1 = 
i nl{ /c such that S ^ 1 1 \ > £}. Events maxi<,<„(|5' i -|) > £ and N <n are identical. Furthermore, 

P(|s„| > 6) = £ P(|5„| >£n)V = j) = £P (\si j) \ > £ n N = . 

j= t i= t 
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As soon as j < n, 2 Sj — S n + S ^ and 2\Sj | ?=*;|S n | + |sj^ j . Therefore, the event { (A'y | > e} is included 
in {\S n \ > £}u{|sj/ , | > e} and 

r(N<n)^j^¥(\Sj\>e n j) < £ P(|S„| > e n N = j) + £ P (|S^| > e n N = j). 


/= i j= i 

This ends the proof of Levy’s inequality. 


;= t 


□ 


Should we need to provide some formula beyond the hypotheses of Levy’s inequality, we may have 
to prove Doob’s original inequality for martingales and sub-martingales [20] in PVS. It follows a proof 
path very different from Doob-Kolmogorov’s inequality but it is not limited to second order moment and 
it can be applied to any sub-martingale Sj k with k > I to lead to 

/ \ E f S' 2 *] 


Here, we use Markov’s inequality applied to S' 2 * in order to obtain the results of Table 1 : 

P(|S„| > e) = P (s 2k > e 2A ) < E (sf\ / £ 2k . 


Formulas 

E(S 2 ) — m 2 (j«) 

E(S 4 ) = m 4 ( \n+ j«(n — 1)) 

E(Sjj) r m u 6 (ljn + n(n — 1) + |n(n — l)(n — 2)) 

E(S 8 ) = m 8 (\n+ yj n(n — 1) + yn(n — l)(n — 2) + pn(n — l)(n — 2)(n — 3)) 

are based on the binomial formula for independent symmetric random variables 


E 



E (2*)! 

^i+^2 H b k n =k 


E (xf 1 ) E (xf 2 ) 

(2*i )! (2* 2 )! 


E(X 2 *») 

( 2 *,,)! ' 


Proof. We first prove the formula below by induction on n for any exponent m. 


e (s:) 


y E(X™ l )E(X™ 2 ) 

m\ +m2 J i \-m n =m fill* m 2 *. 


E(X/” n ) 

m „ ! 


It holds for n — 1 . We now write the following identity based on the facts that X„ are independent 
and symmetric. E (S“ +1 ) - E ((S„ + X„ + i) m ) is also equal to 


ml 


\ m n + 1=0 


1 rtf 

=o (m-m B+ i)!m„ + i! " +1 " 


ml 




;(2Cr)E(sr 


We use the induction hypothesis on E \ m,1+l j and we collapse the summations. We end the 

proof for the even values of m after noticing that E (X 2k 1 1 j = 0 for any i and any k, since the X n are 
symmetric. □ 
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4 Perspectives and concluding remarks 

To the best of our knowledge, this paper presents the first application of Levy’s inequality to software 
and system reliability of very long processes with an extremely low rate of failure. Our results allow any 
one to develop safe upper limits on the number of operations that a piece of numeric software should 
be permitted to undertake. In addition, we are finishing certification of our results with PVS. The major 
restriction lies in the fact that the slow process of proof checking has forced us to insist that individual 
errors are symmetric. 

At the time we are submitting this work, the bottleneck is the full certification of more results using 
PVS proof assistant. Yet this step is compulsory to provide full certification to future industrial uses. We 
anticipate no problem as these results are gathered in textbooks in computer science and mathematics. 
This library and future work will be included into NASA Langley PVS library 1 as soon as it becomes 
stable. 

The main contribution of this work is that we selected theorems that produce significant results for 
extremely low probabilities of failure of systems that run for a long time, and that are amenable to 
formal methods. During our work, we discarded many mathematical methods that would need too many 
operations or that would be too technical to be implemented with existing formal tools. 

Notice that this work can be applied to any sequence of independent and symmetric random variables 
that satisfy equation (2). It is worth pointing out one more time that violating our assumption (indepen- 
dence of errors) would lead to worse results, so one should treat the limit we have deduced with caution, 
should this assumption not be met. 
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